NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Jointly Extracting Interventions, Outcomes, and Findings from RCT Reports with LLMs

Wadwha, Somin; DeYoung, Jay; Nye, Benjamin; Amir, Silvio; Wallace, Byron C (August 2024, Machine Learning for Healthcare (MLHC))

Full Text Available
On-the-fly Definition Augmentation of LLMs for Biomedical NER

Munnangi, Monica; Feldman, Sergey; Wallace, Byron; Amir, Silvio; Hope, Tom; Naik, Aakanksha (June 2024, North American Chapter of the Association for Computational Linguistics)

Full Text Available
Investigating Mysteries of CoT-Augmented Distillation

https://doi.org/10.18653/v1/2024.emnlp-main.349

Wadhwa, Somin; Amir, Silvio; Wallace, Byron C (January 2024, Association for Computational Linguistics)

Eliciting chain of thought (CoT) rationales - sequences of token that convey a “reasoning” process has been shown to consistently improve LLM performance on tasks like question answering. More recent efforts have shown that such rationales can also be used for model distillation: Including CoT sequences (elicited from a large “teacher” model) in addition to target labels when fine-tuning a small student model yields (often substantial) improvements. In this work we ask: Why and how does this additional training signal help in model distillation? We perform ablations to interrogate this, and report some potentially surprising results. Specifically: (1) Placing CoT sequences after labels (rather than before) realizes consistently better downstream performance – this means that no student “reasoning” is necessary at test time to realize gains. (2) When rationales are appended in this way, they need not be coherent reasoning sequences to yield improvements; performance increases are robust to permutations of CoT tokens, for example. In fact, (3) a small number of key tokens are sufficient to achieve improvements equivalent to those observed when full rationales are used in model distillation.
more » « less
Full Text Available
Open (Clinical) LLMs are Sensitive to Instruction Phrasings

https://doi.org/10.18653/v1/2024.bionlp-1.5

Ceballos-Arroyo, Alberto Mario; Munnangi, Monica; Sun, Jiuding; Zhang, Karen; McInerney, Jered; Wallace, Byron C; Amir, Silvio (January 2024, Association for Computational Linguistics)

Instruction-tuned Large Language Models (LLMs) can perform a wide range of tasks given natural language instructions to do so, but they are sensitive to how such instructions are phrased. This issue is especially concerning in healthcare, as clinicians are unlikely to be experienced prompt engineers and the potential consequences of inaccurate outputs are heightened in this domain. This raises a practical question: How robust are instruction-tuned LLMs to natural variations in the instructions provided for clinical NLP tasks? We collect prompts from medical doctors across a range of tasks and quantify the sensitivity of seven LLMs—some general, others specialized—to natural (i.e., non-adversarial) instruction phrasings. We find that performance varies substantially across all models, and that—perhaps surprisingly—domain-specific models explicitly trained on clinical data are especially brittle, compared to their general domain counterparts. Further, arbitrary phrasing differences can affect fairness, e.g., valid but distinct instructions for mortality prediction yield a range both in overall performance, and in terms of differences between demographic groups.
more » « less
Full Text Available
Revisiting Relation Extraction in the era of Large Language Models

https://doi.org/10.18653/v1/2023.acl-long.868

Wadhwa, Somin; Amir, Silvio; Wallace, Byron (January 2023, Association for Computational Linguistics)

Full Text Available
RedHOT: A Corpus of Annotated Medical Questions, Experiences, and Claims on Social Media

https://doi.org/10.18653/v1/2023.findings-eacl.61

Wadhwa, Somin; Khetan, Vivek; Amir, Silvio; Wallace, Byron (January 2023, Association for Computational Linguistics)

Full Text Available
On the Impact of Random Seeds on the Fairness of Clinical Classifiers

Amir, Silvio; van de Meent, Jan-Willem; Wallace, Byron C. (April 2021, North American Chapter of the Association for Computational Linguistics (NAACL))
null (Ed.)
Recent work has shown that fine-tuning large networks is surprisingly sensitive to changes in random seed(s). We explore the implications of this phenomenon for model fairness across demographic groups in clinical prediction tasks over electronic health records (EHR) in MIMIC-III —— the standard dataset in clinical NLP research. Apparent subgroup performance varies substantially for seeds that yield similar overall performance, although there is no evidence of a trade-off between overall and subgroup performance. However, we also find that the small sample sizes inherent to looking at intersections of minority groups and somewhat rare conditions limit our ability to accurately estimate disparities. Further, we find that jointly optimizing for high overall performance and low disparities does not yield statistically significant improvements. Our results suggest that fairness work using MIMIC-III should carefully account for variations in apparent differences that may arise from stochasticity and small sample sizes.
more » « less
Full Text Available

Search for: All records